Class 8 Lecture Demonstration Lab

Part I - Review of Vulnerability Model:

  • Review Vulnerability Model:

  • Mapping Model Components:

Part II - Explore SVI vulnerability dataset + Histogram and Statistic Tool

  • CDC SVI 2018:

  • To start, load the CDC SVI dataset to QGIS for New York State. This is the same data that will be used for the social vulnerability component of the assignment this week:

  • The first mapping will be on RPL_THEMES - Overall percentile ranking scaled to 0 > 1:

Mapped via Natural Breaks, 5 classes. Note the -999 class which should be visually subsetted from other values.

Note: SPL_THEMES column in the attribute table is the Sum of series themes prior to scaling in the RPL_THEMES column.

  • In order to see the correct histogram for the real values of column RPL_THEMES, the -999 value should be removed. This has been completed in the feature set exclude_999:

scaled score is readily evident when -999 value has been removed from the dataset

  • Next, utilize the statistics tool on the RPL_THEMES column:

Statistics Tool

Statistics Profile for variable RPL_THEMES

  • Next, we will conduct the same statistics tool overview on the column EP_AGE17:

Data Dictionary Listing for column EP_AGE17

Statistics Profile for variable EP_AGE17

Part III - Conduct a Linear Scale Interpolation on column EP_AGE17

As we’ve seen in Part II, EP_AGE17 has the following minimum and maximum values:

  • MIN = 0

  • MAX = 65.1

  • The universe of values can be anywhere from zero upwards slightly beyond 65. With linear interpolation the goal is to ‘remap’ these original values to (usually) a more standardized, ‘simple’ scale. For our purposes we will map the values to 1 > 10.

scale_linear transforms a given value from an input domain to an output range using linear interpolation.

  • With the Field Calculator open, we will create a new column LIN_S_17, type integer, and populate using a linear scale math function as follows:

scale_linear("EP_AGE17", 0, 65.1 , 1, 10)

Field Calculator with scale_linear function

Check resulting remapped values via statistics tool

Part IV - Conduct a Min - Max Normalization on column EP_AGE17

  • While Linear Interpolation is simple and quick, its not necessarily the most precise scaling method available. The resulting scale may not capture breakpoints best suited to a particular mapping output. Another typical approach is Min - Max Normalization. The following equation is utilized for this normalization:
  • Using the formula, the following equation can be applied to a new column in Field Calculator titled MM_17, type decimal:

  • Formula = X new = (X – X min) / (X max – X min)

  • As Applied in new column MM_17 = ("EP_AGE17" - 0) / (65.1 - 0)

Field Calculator with min - max formula applied

  • As expected, the statistics tool shows the 0 > 1 range of the new normalization:

Statistic Profile for MM_17 column

  • Min - Max Normalization can also be performed for a custom range, i.e. 1 > 10 as opposed to 0 > 1 as we have done thus far. The formula for this normalization is as follows:

  • To rescale a range between an arbitrary set of values [a, b], the formula becomes:

Note: a and b are the min-max values of the new normalization in the above formula.

Part V - Beyond Normalization - How to create Clusters of Similarity

  • The purpose of normalization is typically to normalize multiple variables across different measurements in order to create consistency in a final scoring mechanism. We can hand over a final scoring process to a clustering algorithm that will ‘compare’ or ‘evaluate’ the values in one variable column to those of other variable columns, resulting in ‘similar’ groupings or clusters. The disadvantage of this approach is that the resulting clusters don’t show ‘directionality’; that is, we can’t see in the categorical result of the clustering process any ‘more’ or ‘less’ measurement, just that the resulting clusters are of similar pattern between the variables inside the cluster and dissimilar outside the cluster.

With clustering - also known as group analysis, we may want to find cluster patterns between two variables as follows:

  • Housing in structures with 10 or more units - E_MUNIT

  • Persons aged 17 and younger - E_AGE17

  • In QGIS there is a clustering tool that utilizes several algorithm options. For the demonstration we will use the K Means method in the tool.

  • First, add the plugin to QGIS - Attribute based clustering:

Attribute based clustering

  • Second, Populate the tool as follows. Note that normalize attributes is toggled ON; in effect this instructs the tool to normalize each attribute so that outliers do not impact the resulting clusters:

Attribute based clustering Parameters

  • Third, when tool completes the cluster analysis, a new variable class can be mapped via categorical classification:

class categorical values mapped

  • Fourth, review results. Clearly class 2 is a group that features high values on the E_MUNIT variable as we know Mid and Lower Manhattan feature nearly only large apartment buildings as housing options:

class categorical values mapped